Pseudo-Anchor Text Extraction for Vertical Search

نویسندگان

  • Shuming Shi
  • Fei Xing
  • Mingjie Zhu
  • Zaiqing Nie
  • Ji-Rong Wen
چکیده

Anchor text plays a special important role in improving the performance of general Web search. The importance of anchor text comes from the fact that it is fairly objective description for a Web page by potentially a large amount of other Web pages. Vertical search provides indexing and search functionality for objects in a certain domain, and is becoming an important supplement for general Web search. It is desired to utilize anchor text in vertical search as well to improve search performance. Vertical objects typically lack explicit URLs to accurately identify them. The anchor-text of a vertical object is also hard to acquire explicitly. This paper proposes concepts of pseudo-URL and pseudo-anchor-text for vertical objects, corresponding to the URL and anchor-text of a general Web page. For extracting and utilizing pseudo-anchor-text information of vertical objects, we focus on candidate anchor block accumulation and pseudo-anchor extraction in this paper. State-of-the-art data integration techniques are utilized to accumulate candidate anchor blocks belonging to same objects. Pseudo-anchor text for each object is extracted from its candidate anchor blocks using a machine learning based approach. A case study in academic search domain indicates that our approach is able to dramatically improve search performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anchor Text Extraction for Academic Search

Anchor text plays a special important role in improving the performance of general Web search, due to the fact that it is relatively objective description for a Web page by potentially a large number of other Web pages. Academic Search provides indexing and search functionality for academic articles. It may be desirable to utilize anchor text in academic search as well to improve the search res...

متن کامل

TREC10 Web and Interactive Tracks at CSIRO

Our primary goals in the Web track participation were two-fold: A) to confirm our earlier finding [1] that anchor text is useful in a homepage finding task, and B) to provide an interactive search engine style interface to searching the WT10g data. In addition, three title-only runs were submitted, comparing two different implementations of stemming to unstemmed processing of the raw query. Non...

متن کامل

A New Approach Towards Vertical Search Engines - Intelligent Focused Crawling and Multilingual Semantic Techniques

Search engines typically consist of a crawler which traverses the web retrieving documents and a search frontend which provides the user interface to the acquired information. Focused crawlers refine the crawler by intelligently directing it to predefined topic areas. The evolution of search engines today is expedited by supplying more search capabilities such as a search for metadata as well a...

متن کامل

TREC-10 Web Track Experiments at MSRA

In TREC-10, Microsoft Research Asia (MSRA) participated in the Web track (ad hoc retrieval task and homepage finding task). The latest version of the Okapi system (Windows 2000 version) was used. We focused on the developing of content-based retrieval and linkbased retrieval, and investigated the suitable combination of the two. For content-based retrieval, we examined the problems of weighting...

متن کامل

Mining Anchor Text Trends for Retrieval

Anchor text has been considered as a useful resource to complement the representation of target pages and is broadly used in web search. However, previous research only uses anchor text of a single snapshot to improve web search. Historical trends of anchor text importance have not been well modeled in anchor text weighting strategies. In this paper, we propose a novel temporal anchor text weig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006